Method and apparatus for expanding audio data

ABSTRACT

Systems implementing the invention allow a user to time stretch an audio track without changing the pitch of the sound, and to produce optimal audible qualities of the output signal. The approach utilized in the invention relies on providing several time stretching methods, each one of which is selected based on one or more criteria of the audio data properties. One method relies on crossfading pairs of segments of audio data while running one segment backward every other repetition. The second time stretching method detects inaudible segments and inserts longer periods of audible data within those segments. The third method utilizes a reverb to create a reverb segment that is played after the original segment.

FIELD OF THE INVENTION

The invention relates to the field of audio data engineering. Moreparticularly, the invention discloses a method and apparatus forexpanding audio data.

BACKGROUND

Artisans with skill in the area of audio data processing utilize anumber of existing techniques to modify audio data. Such techniques areused, for example, to introduce sound effects (e.g., adding echoes to asound track), correct distortions due to faulty recording instruments(e.g., digitally master audio data recorded on old analog recordingmedia), or enhance an audio track by removing noise.

One method to enhance an audio file involves lengthening the audio data.The process of lengthening or time stretching audio data allows users toexpand data into places where it would otherwise fall short. Forexample, if a movie scene requires that an audio track be of a certainduration to fit a timing requirement and the audio track is initiallytoo short, the audio data would need to be lengthened in a way that doesnot radically distort the sound of that data. Time stretching alsoprovides a way to conceal errors in an audio signal, such as replacingmissing or corrupted data with an extension of the audio signal thatprecedes the gap (or follows the gap).

One way to slow down or speed up playback of an audio track or to takeup a longer or shorter duration of time involves changing the speed ofplayback. However, because sound carries information in the frequencydomain, slowing down a waveform results in changing the wavelength ofthe sound. The human ear perceives such wavelength changes as a changein the pitch. To a listener, that change in the pitch is generallyunacceptable.

Existing solutions for lengthening audio data, without modifying thepitch, take segments from within the audio data and insert copies ofthose segments repeatedly to create a new lengthier audio data.

There are at least two drawbacks to this prior art lengtheningapproach: 1) the human ear is very sensitive to such audio manipulationsas the outcome is perceived as having audible artifacts; and 2) theinsertion of segments in the audio data frequently results in producingdiscontinuities that generate high frequency wave forms which are notadequately filtered by the low-pass filter that is in one way or anotherpresent in playback devices. The human ear perceives high-frequencyartifacts as clicks. Furthermore, existing techniques require additionalmanipulations to mask the artifacts introduced by theinsertion/repetition techniques. Some of these masking techniquesattempt to hide the artifacts by fading the end of the insertedsegments. Often, however, the human ear can perceive imperfections, evenwhen masking techniques are applied. A solution that aims at timestretching audio data while preserving the pitch should avoidintroducing artifacts through numerical manipulation of the audio data(e.g. numerical filters) to minimize any imperfections perceivable bythe human ear.

There is a need for a method and apparatus for modifying the length ofan audio track while preserving its audible qualities. Embodiments ofthe invention provide a method for “time stretching” an audio signalwhile keeping the pitch unchanged and optimizing the audible qualities.

DESCRIPTION OF THE DRAWINGS

FIGS. 1A and 1B illustrate waveforms of typical audio data as used inembodiments of the invention.

FIG. 2 is a flowchart that illustrates the steps involved in providingaudio data expansion.

FIG. 3 shows plots of an audio data segment waveform and its localenergy.

FIG. 4 is a block diagram illustrating the process by which a systemembodying the invention expands audio data.

FIG. 5 is a flowchart diagram illustrating steps involved in the basiccrossfading method used in embodiments of the invention.

FIG. 6 is a block diagram that illustrates the process by which a systemembodying the invention builds a chain of crossfaded segments to achievelarger expansion ratios.

FIG. 7A illustrates the process by which a system embodying theinvention builds a chain of crossfaded segments to achieve largerexpansion ratios while preserving a high quality of audible audio data.

FIG. 7B illustrates a particular embodiment of the invention that allowsa system to expand an original audio signal while preserving a highquality of audible audio data.

FIG. 8 is a flowchart diagram that illustrates steps involved inexpanding an audio data segment using backward/forward method incombination with the crossfading method in embodiments of the invention.

FIG. 9 is a flowchart diagram illustrating steps involved in timestretching audio data using a threshold based insertion method inembodiments of the invention.

FIG. 10 is a flowchart illustrating steps involved in utilizing a reverbto time stretch an audio segment in accordance with embodiments of theinvention.

SUMMARY OF THE INVENTION

An embodiment of the invention relates to a method and apparatus fortime stretching audio data. Systems embodying the invention providemultiple approaches to time stretching audio data by preprocessing thedata and applying one or more time stretching methods to the audio data.Preprocessing the audio data involves one or more techniques formeasuring the local energy of an audio segment, determining a method forforming audio data segments, then applying one or more methods dependingon the type of the local energy of the audio signal. For example, oneapproach measures the local square (or sum thereof) of amplitudes of asignal. The system detects the spots where the energy amplitude is low.The low local energy amplitudes may occur rhythmically, as in the caseof music audio data. The low local energy amplitudes may also appearfrequently, with a significant difference between high-energy amplitudesand low energy amplitudes, such as in the case of speech.

The system implements multiple methods for time stretching audio data.For example, when low energy amplitude occurrences are lasting andregular, a zigzag method is applied to the audio data. The zigzag methodinvolves selecting a pair of low energy amplitude segments andcross-fading the segments in a sequence whereby in every otherrepetition a segment is run backward and cross-faded with the pairingsegment run forward. The zigzag method, also, involves copying one ofthe segments alternately forward then backward between consecutiverepetitions.

When the system detects frequent pauses, such as in a speech orpercussion, the system utilizes a method that inserts inaudible datawithin the segments of pause. Some audio signals can be time stretchedwith this method very successfully, particularly signals which haveportions that are energetic (loud) and, ideally, portions that aresilent. Such is the case for recordings of many percussive musicalinstruments, such as drums; here, nearly all of the energy of a segmentmay be concentrated in a very short loud section (the striking of thedrum). Signals with no quiet section or of constant energy do not lendthemselves to this technique.

The system utilizes a reverberation based time stretch method of theinvention on continuous-energy signals. The reverberation methodinvolves utilizing a reverb means to create a reverb image of a segment,play the segment and join the reverb segment at the end of it.

DETAILED DESCRIPTION

The invention discloses a method and apparatus for providing timestretching of audio data. In the following description, numerousspecific details are set forth to provide a more thorough description ofembodiments of the invention. It will be apparent, however, to oneskilled in the art, that it is possible to practice the inventionwithout these specific details. In other instances, well known featureshave not been described in detail so as not to obscure the invention.

Terminology

Throughout the following disclosure, any reference to a user alternatelyrefers to a person using a computer application and/or to one or moreautomatic processes. The automatic processes may be any computer programexecuting locally or remotely, that communicates with embodiments of theinvention following, and that may be triggered following anypredetermined event.

In the disclosure, any reference to a data stream may refer to any typeof means that allows a computer to obtain data using one or more knownprotocols for obtaining data. In its simplest form, a data source is alocation of a random access memory of a digital computer. Other formsfor data streams comprise a flat file (e.g. text or binary file)residing on a file system. A data stream may also be a data streamthrough a network socket, a tape recorder/player, a radio-wave enableddevice, a microphone or any other sensor capable of capturing audiodata, an audio digitizing machine, any type of disk storage, arelational database, or any other means capable of providing data to acomputer. Also, an input buffer refers to a location capable of holdingdata while in the process of executing the steps in embodiments of theinvention. Throughout the disclosure, an input buffer, input audio data,and input data stream all refer to a data source. Similarly, an outputbuffer, output data, and output data stream all refer to an output ofaudio data, whether for storage or for playback.

Digital audio data are generally stored in digital formats on magneticdisks or tapes and laser readable media. The audio data may be stored ina number of file formats. Examples of audio file formats are the AudioInterchange File Format (AIFF). This format stores the amplitude datastream and several audio properties such as the sampling rate and/orlooping information. The system may embed audio data in a file thatstores video data, such as Moving Picture Expert Group (MPEG) format.The invention as disclosed herein may be enabled to handle any fileformat capable of storing audio data.

The invention described herein is set forth in terms of method steps andsystems implementing the method steps. It will be apparent, however, toone with ordinary skill in the art that the invention may be implementedas computer software i.e. a computer program code capable of beingstored in the memory of a digital computer and executed on amicroprocessor, or as a hardware i.e. circuit board based implementation(e.g. Field Programmable Gate-Array, FPGA, based electronic components).

Audio Data and Waveforms

FIGS. 1A and 1B illustrate waveforms of typical audio data as used inembodiments of the invention. Audio data 110, as illustrated in FIG. 1A,is a ten (10) second piece of a music recording. The waveform of musicrecordings (e.g. 110) is generally characterized by transients (e.g.106) representative of one or more instruments that keep a rhythmic beatat regular intervals (e.g. 104). Waveform 120 in FIG. 1B shows amagnified view of a small portion from plot 110 of FIG. 1A. Regions 125and 126 correspond to two (2) successive beats. The beats (ortransients) are generally characterized by a noticeably high amplitude(or energy), and a more complex frequency composition. Between beats,the waveform shows a steadier activity.

Waveforms of voice recordings also possess some descriptivecharacteristics that are distinct from the music. For example, thewaveform of voice data shows more pauses, and an absence of rhythmicactivity. In the following disclosure, the invention describes ways toanalyze the waveforms having transients caused by rhythmic beats inaudio data. However, it will be apparent to one with ordinary skill inthe art, that the system may utilize similar techniques for analyzingvoice data, or any other sources of audio data, to implement theinvention.

FIG. 2 is a flowchart diagram that illustrates the overall stepsinvolved in providing audio data expansion in embodiments of theinvention. At step 210, a system embodying the invention analyzes theaudio data to be expanded to detect one or more zones of leastsensitivity to the method. Sensitivity defines the amount of artifactsthe method is likely to introduce in the output signal. The system usesone or more criteria to detect zones ready for manipulation whileintroducing the least amount of artifacts in the output data. Forexample, the system is able to detect local energy values in signalamplitude and frequency domains, and determine the zones (or segments)of audio data within which one or more expansion methods may be appliedwithout introducing audible artifacts. At step 220, the system selectsone or more methods to achieve the results based on the audiocharacteristics determined at step 210. Three different methods(Threshold, Crossfading, and Threshold Insertion) for expanding an audiodata segment, as well as the favorable conditions in which each of themethods yields optimum results are discussed below.

At step 230, the system applies the selected method to the input audiodata and generates an output audio data. Generally, the expansion method(or methods) utilizes one or more original buffers as input data and oneor more output buffers. The system may use other buffers to store datafor intermediary steps of data processing. At step 240, the systemwrites (or appends) the processed data in an output buffer.

FIG. 3 shows plots of an audio data segment waveform and its localenergy, typically used in embodiments of the invention. Plot 120 shows asegment of audio data as explained in FIG. 1. Plot 320 shows the energycorresponding to the audio data represented in 120. In this example, thesystem computes the energy using the square of each data point'samplitude. Plot 320 represents local energy by binning samples (e.g. bysumming each five consecutive data points). The system can also utilizeother methods for computing local energy. For example, instead of thesquare function, the system may compute the energy using the absolutevalue of data points, or any other method capable of representing theenergy of a signal.

In one embodiment of the invention, the energy of an audio segmentprovides a mechanism for detecting zones that lend themselves to audiodata manipulation while minimizing audible (or unpleasant) artifacts.For example, in FIG. 3 a simple threshold technique may enable thesystem to detect zones of activity such as 306, 307 and 308. Whereaszones 306 and 308 are zones of high (and more complex) activity, zone307 presents a steadier activity. In embodiments of the invention, zone307 provides segments where the system may optimally utilize expansionmethods. For example, by repeatedly replicating smaller segments withinzone 307, it is possible to expand an audio segment, up to a certainexpansion ratio, without introducing unpleasant audible artifacts.

One feature of the invention is the ability to slice the audio data in amanner that allows a system to identify the processing zones. The systemmay index processing zones (or slices) using the segment's amplitudes.In music audio data, the beats, typically, follow the music notes orsome division thereof. The optimal zones are typically found in betweenbeats.

Crossfading Method

Crossfading refers to the process where the system mixes two audiosegments, while one is faded-in and the second one is faded-out.

Program Pseudo-Code 1 for(i=0; i<stretched_length; i++) { fade_in = i /stretched_length; fade_out = 1.0 − fade_in; output_buffer[i] =fade_out * original_buffer[i] +fade_in * original_buffer [original_length − stretched_length + i]; }

Program Pseudo-code 1 illustrates the basic time stretching crossfademethod. “original_buffer” is a range of memory which holds one segmentof the unprocessed signal; “original_length” is the length of theoriginal segment in samples; “Output_buffer” is a range of memory whichholds the results of the crossfade calculations; “stretched_length” isthe length of the resulting “output_buffer” segment in samples, which islarger than the “original_buffer” segment length; “fade_in” is afraction that smoothly increases from 0.0 to 1.0; “fade_out” is afraction that smoothly decreases from 1.0 to 0.0.

Program Pseudo-Code 1 uses a linear function for fade-in and fade out.However, the fading function most frequently used is the square root. Anembodiment of the invention utilizes a linear function that approximatesa square root function to reduce the computation time. The invention mayutilize other “equal power” pairs of functions (such as sine andcosine). In addition, the index for the faded-in portion (in the lastline of code) exceeds the starting boundary, i.e. references valuesbefore the beginning of the buffer; such a negative index refers tosamples from a previous segment's buffer. The code above illustrates thecrossfade process applied to a single segment of audio. It is assumed,however, that a segment exists before and after this segment.

FIG. 4 is a block diagram illustrating the process by which a systemembodying the invention expands audio data. FIG. 4 illustrates animproved version of the basic crossfade method utilizing a combinationof crossfading and copying. Specifically, the system copies a portion ofthe beginning of the segment (e.g. 422, a middle portion is thencross-faded and a final portion (e.g. 424) is then copied, completingprocessing of the segment.

The system processes an input stream of audio data 410 in accordancewith the detection methods described at step 210. The system divides theoriginal audio signal 410 into short segments. In the example of FIG. 4,the system identifies a processing zone (e.g. starting at 420). Thesystem may further analyze the processing zone and select one or moreprocessing methods for expanding the audio data. After the data isprocessed, the system appends that data to an output buffer 450. In theexample provided in FIG. 4, a first segment 422 and a second segment 424are destined for copying without modification to the beginning and theend of the output buffer, respectively.

In FIG. 4, after the system copies segment 422 to the output buffer, thesystem cross-fades two segments 430 and 440. In the example of FIG. 4,Segment 422 is faded-out while segment 424 is faded in. For example, anaudio signal is faded-out (attenuated from full amplitude to silence)quickly (on the order of 0.03 seconds to 0.3 seconds) while the sameaudio signal is faded-in from an earlier position, such that the end ofthe faded-in signal is delayed in time, thus making the audio signalappear to sound longer. The division into segments is such that thebeginning of each super segment occurs at a regular rhythmic timeinterval. Each segment represents an eighth note or sixteenth note, forexample. The crossfading method is detailed in U.S. Pat. No. 5,386,493,assigned to Apple Computer, Inc. and incorporated herein by reference.

Program Pseudo-Code 2 crossfade_length = end_crossfade −begin_crossfade; for(i=0; i<stretched_length; i++) { // copy firstsegment if (i<begin_crossfade) output_buffer[i] = original_buffer[i]; //crossfade within the segment else if((i>=begin_crossfade) &&(i<end_crossfade)) fade_in   =  (i  −  begin_crossfade)  /crossfade_length; fade_out = 1.0 − fade_in; output_buffer[i] =fade_out * original_buffer[i] + fade_in *original_buffer[original_length − stretched_length + i]; // copy thefinal segment else if (i>=end_crossfade) output_buffer[i] =original_buffer[original_length − stretched_length + i]; }

Program Pseudo-Code 2 illustrates an improved “Copy-Crossfade-Copy” timestretch method. The segment is broken into three pieces: a copy section(e.g. 422), a middle crossfade section and a final copy section (e.g.424). The result from crossfading segments 430 and 440 is a compositesegment 446. This copy-crossfade-copy method works up to a stretch ratioof around 1.5; i.e. the new stretched audio signal can be up to 1.5times as long as the original signal without significant artifacts beingaudible.

FIG. 5 is a flowchart diagram illustrating steps involved in the basiccrossfading method used in embodiments of the invention. At step 510, asystem embodying the invention copies one or more unedited segments ofaudio data from the original buffer to an output buffer. When the systemreaches a crossfading segment, it computes a fade-out coefficient, usingone or more fading functions described above, at step 530. At step 540,the system computes the fade-in coefficient. At step 550, the systemcomputes the fade-out segment. For example, step 550 computes theproduct of a data sample from the original buffer segment 430, of FIG.4, and a corresponding fade-out coefficient in 432. At step 560, thesystem computes the fade-in segment. For example, step 560 computes theproduct of a data sample from the original buffer segment 440, of FIG.4, and a corresponding fade out coefficient in 442.

At step 570, a system embodying the invention combines the fade-outsegment and the fade-in segment to produce the output cross-fadedsegment. Combining the two segments typically involves adding the fadedsegments. However, the system may utilize other techniques for combiningthe faded segments. At step 580, the system copies the remainder of theunedited segments to the output buffer.

FIG. 6 is a block diagram that illustrates the process by which a systemembodying the invention builds a chain of cross-faded segments toachieve larger expansion ratios. The example, of FIG. 6 utilizes aninput stream 410 such as the one described in FIG. 4. The input audio isanalyzed and segments suitable (e.g. starting at 420) for applying oneor more time stretching methods.

To achieve stretch ratios larger than the ones described above (i.e. oneand half times), additional crossfade-copy sections can be chainedtogether to achieve the desired length. Research, using empiricaltesting leading to the invention, shows that repeating a middlecrossfade-copy-crossfade section of maximum possible length isadvantageous; thus the invention uses “begin_max_crossfade” and“end_max_crossfade” below. These values are defined positions within therange of the original buffer length, while “begin_crossfade1”,“begin_crossfade2” etc. (without the max in the middle of the name) arepoints in the new stretched buffer, which exceeds the length of theoriginal buffer. Program Pseudo-code 3 (below) shows how to create asequence of copy-crossfade-copy-crossfade-copy-crossfade-copy.

Program Pseudo-Code 3 crossfade_length = end_crossfade1 −begin_crossfade1; for (i=0; i<stretched_length; i++) { // copy fromoriginal buffer to stretch buffer if (i<begin_crossfade1)output_buffer[i] = original_buffer[i]; // first crossfade elseif((i>=begin_crossfade1) && (i<end_crossfade1)) fade_in = (i −begin_crossfade1) / crossfade_length; fade_out = 1.0 − fade_in;output_buffer[i] = fade_out * original_buffer[i] + fade_in *original_buffer[begin_max_crossfade1 + i − begin_crossfade1]; // secondcopy else if((i>= end_crossfade1)&&(i<begin_crossfade2))output_buffer[i] = original_buffer[end_max_crossfade1 + i −end_crossfadel]; // second crossfade else if((i>=begin_crossfade2) &&(i<end_crossfade2)) fade_in = (i − begin_crossfade2) / crossfade_length;fade_out = 1.0 − fade_in; output_buffer[i] = fade_out *original_buffer[begin_max_crossfade2 + i − begin_crossfade2] + fade_in *original_buffer[begin_max_crossfade1 + i − begin_crossfade2]; // thirdcopy else if((i>=end_crossfade2)&&(i<begin_crossfade3)) output_buffer[i]= original_buffer[end_max_crossfade1 + i − end_crossfade2]; // thirdcrossfade else if((i>=begin_crossfade3) && (i<end_crossfade3)) fade_in =(i − begin_crossfade3) / crossfade_length; fade_out = 1.0 − fade_in;output_buffer[i] = fade_out * original_buffer[begin_max_crossfade2 + i −begin_crossfade3] + fade_in * original_buffer[original_length −stretched_length + i]; // final copy elseif((i>=end_crossfade3)&&(i<stretched_length)) output_buffer[i] =original_buffer[original_length − stretched_length + i]; }

In FIG. 6, the crossfading method is applied twice on an audio segment.A first application concerns segments 630, faded-out with function 632,combined with segment 634, faded-in with function 636. The result of thefirst crossfading is segment 652. A second crossfading concerns segments640, faded-out with function 642, combined with segment 644, faded-inwith function 646. The result of the second crossfading is segment 656.In between crossfading repetitions unedited inter-segments copies 622,654 and 624 and directly copied from the original audio data stream tothe output buffer 650.

Although the crossfading method allows arbitrarily large time stretchratios, the rapid repetition of the same short section of audio manytimes in a row may produce unpleasant audible artifacts. Artifacts soundsimilar to a buzz or rapid flutter.

FIG. 7A illustrates the process by which a system embodying theinvention builds a chain of cross-faded segments to achieve largerexpansion ratios while preserving a high quality of audible audio data.The invention provides a modification to the “ChainedCopy-Crossfade-Copy” method (described in FIG. 6) that reverses everyother crossfade-copy-crossfade section in time. The basic concept isthat every other crossfade-copy-crossfade cycle one (or both) of thecross-faded segments are run backward.

This back and forth, or “Zigzag” approach produces better sounding audiostreams for large stretch ratios because the repeated section iseffectively twice as large relative to the ordinary ChainedCopy-Crossfade-Copy method (“back and forth” is twice as long as “forthonly”). Thus, the artifact that arises from rapid repetition of the sameaudio signal is reduced by up to half.

FIG. 7A shows two segments 1 and 4 determined to be unedited copysegments. Segments 1 and 4 are treated as 422 and 424 (in previousfigures), and are copied from the input audio stream to the output audiostream. Segments 2 and 3 are examples of segments used to createstretched segments of the output stream in accordance with embodimentsof the invention. Sequence 720 shows successive fade-out segments.Rightward pointing arrows in the sequence designate those segments usedin a forward sense during the computation of the fade-out segment.Leftward pointing arrows designate segments used in a backward (reverse)sense during the computation of the fade-out segment. Likewise, insequence 730 rightward and leftward pointing arrows designate forwardand backward senses, respectively, during the computation of the fade-insegment. The designations “F” and “B” are also indications for whether asegment is used in a forward or backward sense, respectively.

Output stream 740 shows the result of the computation using forward andbackward alternations when the number of repetitions is an even number.Output stream 750 is an example of a combination of crossfadingtechnique used for an odd number of repetitions.

Restricting the number of middle crossfade-copy sections to odd numbers(e.g. 1, 3, 5, 7, etc.) was found in research leading to the inventionto improve the overall sound quality. This ensures a regularforward-backward-forward-backward-forward pattern; if even the number ofsections were allowed, irregular patterns such asforward-backward-forward-forward would result, which sound inferior.

FIG. 7B illustrates a particular embodiment of the invention that allowsa system to expand an original audio signal while preserving a highquality of audible audio data. In the example of FIG. 7B, the systemdefines four (4) sub-segments 1, 2, 3 and 4 in an original data segment760. The system defines the sub-segments by defining the boundaries 761,762, 763 and 764 that indicate to a system the limits for conducting oneor more types of data processing. The system computes the boundaries'values in a manner that prevents, for example, indices to point out ofthe buffer range. In the example of FIG. 7B, 761 defines the end of asub-segment (1) which the system copies unedited to the output buffer770. Boundaries 761 and 762 indicate a maximum beginning and a maximumending of a first crossfading region (labeled 2). Likewise, boundaries763 and 764 indicate a maximum beginning and a maximum ending of asecond crossfading region (labeled 4). The maximum beginning and maximumending defines the positions within which the system selects theportions of a sub-segment to be crossfaded.

The system, in the example of FIG. 7B, generates the output buffer 770by first copying sub-segments 1, 2 and 3 to the output buffer. Thesystem then generates a first crossfaded portion using sub-segment 4forward crossfaded with segment 4 backward. The boundaries 771 and 772define the beginning and end of the first crossfaded portion. The systemthen reverses sub-segment 3 and copies the reversed segment to theoutput buffer. Then, the system generates a second crossfaded portionusing sub-segment 2. The system uses sub-segment 2 forward crossfadedwith itself backward. The boundary 773 defines the beginning of thesecond crossfaded portion. The system copies segment 3 to the outputbuffer, then repeats the crossfading-copying process (i.e. generatefirst crossfaded portion, copy backward sub-segment 3 then generatesecond crossfaded portion and copy sub-segment 3), then copiessub-segment 4 to the output buffer.

Program Pseudo-code 4 (below) shows an example of steps leading toexpanding an audio data stream using the zigzag method in combinationwith the crossfading method.

Program Pseudo-Code 4 crossfade_length = end_crossfade1 —begin_crossfade1; for (i=0; i<stretched_length; i++) { // copy forwardfrom original buffer to stretch buffer if (i<begin_crossfade1)output_buffer[i] = original_buffer[i]; // first crossfade: fade outforward while fading in backward else if((i>=begin_crossfade1) &&(i<end_crossfade1)) fade_in = (i − begin_crossfade1) / crossfade_length;fade_out = 1.0 − fade_in; output [i] = fade_out * original_buffer[i] +fade_in * original_buffer [ end_max_crossfade2 − i]; // second copy:copy backward else if((i>=end_crossfade1)&&(i<begin_crossfade2))output[i] = original_buffer [ begin_max_crossfade2 − (i −end_crossfade1)]; // second crossfade: fade out backward while fading inforward else if((i>= begin_crossfade2) && (i<end_crossfade2)) fade_in =(i − begin_crossfade2) / crossfade_length; fade_out = 1.0 − fade_in;output[i] = fade_out * original_buffer [end_max_crossfade1 − (i −begin_crossfade2)] + fade_in * original_buffer[ begin_max_crossfade1 + i− begin_crossfade2]; // third copy is forward elseif((i>=end_crossfade2)&&(i<begin_crossfade3)) output [i] =original_buffer[end_max_crossfade1 + i − end_crossfade2]; // thirdcrossfade: fade out forward while fading in backward elseif((i>=begin_crossfade3) && (i<end_crossfade3)) fade_in = (i −begin_crossfade3) / crossfade_length; fade_out = 1.0 − fade_in;output[i] = fade_out * original_buffer[begin_max_crossfade2 + i −begin_crossfade3] + fade_in * original_buffer[end_max_crossfade2 − (i −begin_crossfade3)]; // fourth copy: copy backward elseif((i>=end_crossfade3)&&(i<begin_crossfade4)) output [i] =original_buffer[begin_max_crossfade2 − (i − end_crossfade3)]; // fourthcrossfade: fade out backward while fading in final forward elseif((i>=begin_crossfade4) && (i<end_crossfade4)) fade_in = (i −begin_crossfade4) / crossfade_length; fade_out = 1.0 − fade_in;output[i] = fade_out * original_buffer[end_max_crossfade1 − (i −begin_crossfade4)] + fade_in * original_buffer[original_length −stretched_length + i]; // final copy elseif((i>=end_crossfade4)&&(i<stretched_length)) output[i] =original_buffer[original_length − stretched_length + i]; }Zigzag Method

FIG. 8 is a flowchart diagram that illustrates steps involved inexpanding an audio data segment using backward/forward method incombination with the crossfading method in embodiments of the invention.At step 810, a system embodying the invention copies the first uneditedsegment from the original buffer to the output buffer (e.g. 422 and 424in previous examples of FIGS. 6 and 7). At step 820, the system computesand combines the fade-out and fade-in segments following the basic stepsdescribed in the flowchart of FIG. 5. The computations that occur ateach repetition involve computing the fading coefficient for each of thefade-out and fade-in segments. The system then computes the product ofthe fade-out segment with the fade-out coefficient, and the product ofthe fade-in segment with the fade-in coefficient with the fade-insegment, respectively, and then sums the results of the two computationsin a single crossfaded segment. At step 830, the system copies anunedited segment between the first crossfaded segment and a secondcrossfaded segment. At step 840, the system computes and combines afade-out segment backward and a fade-in segment forward. At step 840,the system follows the basic steps of computing fading functions.However, the system, while computing the fade-out segment, reverses thesense in which the segment is used (i.e. the last data samples of thesegment are used at the beginning of the faded-out segment).

At step 850, the system embodying the invention copies backward a thirdunedited segment from the original buffer to the output buffer. At step860, computes and combines a faded-out segment forward and a faded-insegment backward. At step 870, the system copies backward a fourthunedited segment from the original audio stream to the output buffer. Atstep 880, computes and combines a faded-out segment backward and afaded-in segment forward. At step 890, the system copies an uneditedfinal segment from the original audio stream to the output buffer.

Both the Chained Copy-Crossfade-Copy and the Zigzag ChainedCopy-Crossfade-Copy methods can be improved by adjusting the positionsof begin_max_crossfade_1, end_max_crossfade1, begin_max_crossfade2 andend_max_crossfade2 (which define the boundaries of the repeated section)for each individual audio segment to minimize audio artifacts. Ideally,the middle section, which is repeated many times, should have a constant“energy”, i.e. no part of this region should sound louder than any otherpart. By dividing a segment into smaller sections and calculating theenergy of each of these sections, it is possible to locate the portionof the segment that has a relatively constant energy. The system movesthe positions of begin_max_crossfade_1 and end_max_crossfade1 to thebeginning of this stable region and moves begin_max_crossfade_2 andend_max_crossfade2 to the end of the region. Various methods calculatean energy value (as described in FIG. 3); one efficient approach is tosum the squares of each sample in a region, another is to sum theabsolute values.

Threshold Insertion Method

Embodiments of the invention utilize a threshold detection method tofind portions of the audio stream where the energy is low enough toqualify as silence. A noise gate would typically, block portions of lowenergy out. A noise gate is a simple signal processor used to removeunwanted noise from a recorded audio signal. A noise gate computes theenergy of the incoming audio signal and mutes the signal if the energyis below a user-defined threshold. If the signal is louder than thethreshold, it is simply passed or copied to the output of the noisegate. Embodiments of the invention use the portions of silence/pause tointroduce longer periods of silence into the audio stream. Theseportions are lengthened by adding inaudible valued samples until thedesired new length is achieved. Some audio signals can be time stretchedwith this method very successfully, particularly signals which haveportions that are energetic (loud) and, ideally, portions that aresilent. Such is the case for recordings of many percussive musicalinstruments, such as drums; here, nearly all of the energy of a segmentmay be concentrated in a very short loud section (the striking of thedrum). Signals with no quiet section or of constant energy do not lendthemselves to this technique.

A common feature in voicemail systems is a “silence remover”, i.e. amechanism for removing pauses between words in order to conserve memoryand to allow the user to listen more quickly to a recorded message.Since background noise is commonly present on recordings, the “silent”pauses to be removed are not completely silent but instead have a finitebut low energy compared to the desired speech signal. The system mayapply a noise gate to the original signal, but instead of muting quietportions of the signal, this modified noise gate simply deletes thequiet portions, thus saving memory.

FIG. 9 is a flowchart diagram illustrating steps involved in timestretching audio data using a threshold based insertion method inembodiments of the invention. At step 910, a system embodying theinvention reads a data sample from the input buffer of audio data. Atstep 920, the system compares the absolute value (or the result of amathematical expression thereof) to a threshold value. If the sample'svalue is greater than or equal to the threshold value, the system writesthe data sample to the output buffer at step 930. If the sample value issmaller than the threshold value, the system inserts inaudible values inthe output buffer at step 940. The amount of data inserted can bepredetermined as a function of the desired stretching ratio and lengthof the silence period and any other parameter that the user may chose toenter. Examples of parameters for stretching (or not stretching) anaudio segment include pauses whose removal would make a speech lessintelligible. At step 950, the system test for end of audio data. If thetest does not detect the end of the audio data it continues with step920, otherwise the system stops the process at step 960.

Artificial Reverberation Method

Artificial reverberators (or “reverbs”) process an audio signal to makeit sound as though the audio signal is being played in an actual room,such as a concert hall. A reverb achieves this acoustic embellishment byadding to the signal a myriad of randomly timed echoes that get quieterover a short time, typically one to five seconds. For example, a singlenote sung into a reverb will continue ringing or sounding even after thesinger has stopped.

Embodiments of the invention utilize one or more reverb methods toexpand audio data segments. Reverb provides a way to time stretch anaudio signal without the signal sounding “reverberated”.

FIG. 10 is a flowchart illustrating steps involved in utilizing a reverbto time stretch an audio segment in accordance with embodiments of theinvention. At step 1010, a system embodying the invention inputs asegment to a reverb while the output of the reverb is not included inthe processed signal until the end of the original un-stretched segmentis reached. At step 1020, the system obtains a reverb segment. A reverbsegment is a segment having the characteristics of one or more echoes ofthe original segment. A reverb may be a physical device enabled to beinterfaced with an embodiment of the invention, or may be a softwaresystem (e.g. software component, or application) capable or generating areverb segment. At step 1030, the system plays the original segment.Playing a segment may be simply feeding the segment to a buffer forstoring audio data, or directly feeding the segment to an acousticssystem. At step 1040, the system embodying the invention feeds thereverb segment to the output, which results in expanding the originalsegment without producing an audible artifact of reverberation. Thesesteps are then repeated for the next segment in the audio stream.

The reverberation based time stretch method of the invention works beston continuous-energy signals, and not as well on percussive signals,thus complementing the noise gate time stretch method discussed above.

Thus a method and apparatus for time stretching audio data that utilizesa detection mechanism to segment the audio data and select one ofmultiple ways of stretching the audio data have been presented. Theartificial reverb based method, as well as the crossfade method, can beused in error concealment as well. The goal in this area of technologyis to synthesize data that is missing or corrupted. Current techniquesinclude frequency analysis of audio sections that directly precede andfollow the missing data, and subsequent synthesis of the missing data.Such approaches are computationally intensive, while simpler approachessuch as merely repeating previous good data sound inferior. Thereverberation time stretch method can sound as good as frequencyanalysis methods, with significantly less computation required.

1. A method for time stretching audio data without changing the pitchcomprising: obtaining at least one audio data stream; obtaining at leastone energy property representation of said at least one audio datastream; obtaining at least one optimal input segment for time stretchingusing said at least one energy property representation; defining a firstsegment and a second segment that at least overlap said optimal inputsegment; and generating an output segment by sequentially crossfadingsaid first segment and said second segment; wherein sequentiallycrossfading comprises a first crossfading of said first segment and saidsecond segment while reversing the sense of said first segment and asecond crossfading of said first segment and said second segment whilereversing the sense of said second segment.
 2. The method of claim 1,wherein said obtaining at least one energy property representationfurther comprises computing a square of the amplitude of data samples insaid audio stream.
 3. The method of claim 1, wherein said obtaining saidat least one optimal input segment further comprises obtaining aplurality of adjacent segments in said audio stream.
 4. The method ofclaim 1, wherein said defining said first segment and said secondsegment further comprises defining a plurality of boundaries associatedwith said first segment and said second segment.
 5. The method of claim1, wherein said defining said plurality of boundaries further comprisesdefining boundaries for copying unedited audio segments.
 6. The methodof claim 1, wherein said crossfading said first segment and said secondsegment further comprises computing a fade-out coefficient and a fade-incoefficient.
 7. The method of claim 6, wherein said crossfading saidfirst segment and said second segment further comprises computing afirst product of said first segment with said fade-out coefficient and asecond product of said second segment and said fade-in coefficient. 8.The method of claim 7, wherein said crossfading said first segment andsaid second segment further comprises summing said first product andsaid second product.
 9. The method of claim 1, wherein said reversingthe sense of said at least one of said first segment and said secondsegment further comprises running an index from the end of said at leastone of said first segment and said second segment.
 10. The method ofclaim 1, wherein said sequentially crossfading further comprises copyingat least a portion of unedited data from said data stream to said outputsegment.
 11. A computer-readable medium carrying one or more sequencesof instructions executable on a computer for time stretching audio datawithout changing the pitch, wherein execution of the one or moresequences of instructions by one or more processors causes the one ormore processors to perform the steps of: obtaining at least one audiodata stream; obtaining at least one energy property representation ofsaid at least one audio data stream; obtaining at least one optimalinput segment for time stretching using said at least one energyproperty representation; defining a first segment and a second segmentthat at least overlap said optimal input segment; and generating anoutput segment by sequentially crossfading said first segment and saidsecond segment; wherein sequentially crossfading comprises a firstcrossfading of said first segment and said second segment whilereversing the sense of said first segment and a second crossfading ofsaid first segment and said second segment while reversing the sense ofsaid second segment.
 12. An apparatus comprising: a network interface; amemory; and one or more processors connected to the network interfaceand the memory, the one or more processors configured for obtaining atleast one audio data stream; obtaining at least one energy propertyrepresentation of said at least one audio data stream; obtaining atleast one optimal input segment for time stretching using said at leastone energy property representation; defining a first segment and asecond segment that at least overlap said optimal input segment; andgenerating an output segment by sequentially crossfading said firstsegment and said second segment; wherein sequentially crossfadingcomprises a first crossfading of said first segment and said secondsegment while reversing the sense of said first segment and a secondcrossfading of said first segment and said second segment whilereversing the sense of said second segment.